The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.

نویسندگان

  • Steven N Evans
  • Frederick A Matsen
چکیده

It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses. We provide a foundation for such methods by establishing that, if we equate a metagenomic sample with its empirical distribution on a reference phylogenetic tree, then the weighted UniFrac distance between two samples is just the classical Kantorovich-Rubinstein, or earth mover's, distance between the corresponding empirical distributions. We demonstrate that this Kantorovich-Rubinstein distance and extensions incorporating uncertainty in the sample locations can be written as a readily computable integral over the tree, we develop L(p) Zolotarev-type generalizations of the metric, and we show how the p-value of the resulting natural permutation test of the null hypothesis 'no difference between two communities' can be approximated by using a Gaussian process functional. We relate the L(2)-case to an analysis-of-variance type of decomposition, finding that the distribution of its associated Gaussian functional is that of a computable linear combination of independent [Formula: see text] random variables.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extreme points of a ball about a measure with finite support

We show that, for the space of Borel probability measures on a Borel subset of a Polish metric space, the extreme points of the Prokhorov, Monge-Wasserstein and Kantorovich metric balls about a measure whose support has at most n points, consist of measures whose supports have at most n+2 points. Moreover, we use the Strassen and Kantorovich-Rubinstein duality theorems to develop representation...

متن کامل

EMDUnifrac: Exact Linear Time Computation of the Unifrac Metric and Identification of Differentially Abundant Organisms

Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover’s distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only...

متن کامل

Optimal Couplings of Kantorovich-Rubinstein-Wasserstein Lp-distance

The research is supported by Zhejiang Provincial Education Department Research Projects (Y201016421) Abstract We achieve that the optimal solutions according to Kantorovich-Rubinstein-Wasserstein Lp−distance (p > 2) (abbreviation: KRW Lp−distance) in a bounded region of Euclidean plane satisfy a partial differential equation. We can also obtain the similar results about Monge-Kantorovich proble...

متن کامل

Imaging with Kantorovich-Rubinstein Discrepancy

We propose the use of the Kantorovich-Rubinstein norm from optimal transport in imaging problems. In particular, we discuss a variational regularisation model endowed with a Kantorovich-Rubinstein discrepancy term and total variation regularization in the context of image denoising and cartoon-texture decomposition. We point out connections of this approach to several other recently proposed me...

متن کامل

Simulation Hemi-metrics between Infinite-State Stochastic Games

We investigate simulation hemi-metrics between certain forms of turnbased 2 1 2 -player games played on infinite topological spaces. They have the desirable property of bounding the difference in payoffs obtained by starting from one state or another. All constructions are described as the special case of a unique one, which we call the Hutchinson hemi-metric on various spaces of continuous pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of the Royal Statistical Society. Series B, Statistical methodology

دوره 74 3  شماره 

صفحات  -

تاریخ انتشار 2012